Phonetic speaker recognition using maximum-likelihood binary-decision tree models
نویسندگان
چکیده
Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particularly bigrams) by exploiting statistical dependencies within a longer sequence window without exponentially increasing the model complexity, as is the case with n-grams. Two ways of dealing with data sparsity are also studied; namely, model adaptation and a recursive bottom-up smoothing of symbol distributions. Results obtained under a variety of experimental conditions using the NIST 2001 Speaker Recognition Extended Data Task indicate consistent improvements in equal-error rate performance as compared to standard bigram models. The described approach confirms the relevance of long phonetic context in phonetic speaker recognition and represents an intermediate stage between short phone context and word-level modeling without the need for any lexical knowledge, which suggests its language independence.
منابع مشابه
Adaptive decision tree-based phone cluster models for speaker clustering
This study presents an approach to speaker clustering using adaptive decision tree-based phone cluster models (DT-PCMs). First, a large broadcast news database is used to train a set of phone models for universal speakers. The multi-space probability distributed-hidden Markov model (MSD-HMM) is adopted for phone modeling. Confusing phone models are merged into phone clusters. Next, for each sta...
متن کاملProbabilistic state clustering using conditional random field for context-dependent acoustic modelling
Hidden Markov Models are widely used in speech recognition systems. Due to the co-articulation effects of continuous speech, context-dependent models have been found to yield performance improvements. One major issue with contextdependent acoustic modelling is the robust parameter estimation of unseen or rare models in the training data. Typically, decision tree state clustering is used to ensu...
متن کاملComparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds
Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...
متن کاملA discriminative splitting criterion for phonetic decision trees
Phonetic decision trees are a key concept in acoustic modeling for large vocabulary continuous speech recognition. Although discriminative training has become a major line of research in speech recognition and all state-of-the-art acoustic models are trained discriminatively, the conventional phonetic decision tree approach still relies on the maximum likelihood principle. In this paper we deve...
متن کاملTowards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation
Modern automatic speech recognition systems use Gaussian mixture models (GMM) on acoustic observations to model the probability of producing a given observation under any one of many hidden discrete phonetic states. This paper investigates the feasibility of using an acoustic decision tree to directly model these probabilities. Unlike the more common phonetic decision tree, which asks questions...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003